Combining Probabilistic and Translation-Based Models for Information Retrieval based on Word Sense Annotations
نویسندگان
چکیده
In this paper, we describe our experiments carried out for the robust word sense disambiguation (WSD) track of the CLEF 2009 campaign. This track consists of a monolingual and bilingual task and addresses information retrieval utilizing word sense annotations. We took part in the monolingual task only. Our objective was twofold. On the one hand, we intended to increase the precision of WSD by a heuristic-based combination of the annotations of the two WSD systems. For this, we provide an extrinsic evaluation on different levels of word sense accuracy. On the other hand, we aimed at combining an often used probabilistic model, namely the Divergence From Randomness BM25 model (DFR BM25), with a monolingual translation-based model. Our best performing system with and without utilizing word senses ranked 1st overall in the monolingual task. However, we could not observe any improvement by applying the sense annotations compared to the retrieval settings based on tokens or lemmas only.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملPhrase Identification in Cross-Language Information Retrieval
Term-sense ambiguity and the difficulty in translating phrases are the main sources of problem in dictionarybased cross-language information retrieval (CLIR) approaches. We propose a term similarity-based translationphrase identification technique to enhance the retrieval effectiveness of a dictionary-based query translation method. The technique identifies noun-phrases in the target language b...
متن کاملCombining Part of Speech Induction and Morphological Induction
Linguistic information is useful in natural language processing, information retrieval and a multitude of sub-tasks involving language analysis. Two types of linguistic information in all languages are part of speech and morphology. Part of speech information reflects syntactic structure and can assist in tasks such as speech recognition, machine translation and word sense disambiguation. Morph...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009